Previously, the client implemented a Kaplan-Meier survival curve to model the survival predictions across different types of dementia.
Client’s Aims
The client has two aims for this project. The first is to produce an interactive graph that provides a comprehensive exploration of the time from formal diagnosis to death based on four key covariates: age at diagnosis, sex, disease duration at diagnosis, and overall cognitive performance measured by the ACE-III test. These covariates were selected by the client based on their expertise and the availability of data. The second aim is to explore additional variables, including those from the Cambridge Behavioural Inventory-Revised carer questionnaire, family history of dementia as defined by a Goldman Score, and cognitive abilities. The client suspects these variables may influence survival rates and wants more clarity on the extent of the impact each factor has on survival.
Model
To meet the client’s objective of understanding how various covariates affect different dementia patient groups, we implemented a Cox Proportional Hazards regression model for each type of dementia, serving as the foundation of the interactive survival graph. This approach was informed by previous client analyses that highlighted statistically significant survival differences across dementia diagnoses. Each model accounted for age at diagnosis, sex, disease duration at diagnosis, and overall cognitive performance as assessed by the ACE-III test. While the Cox regression model does not assume any underlying data distribution, it does require other assumptions to be satisfied (details in the appendix).
Interactive Survival Graph
The interactive survival graph is designed to facilitate the analysis of different patient groups by inputting their specific data. This feature allows users to observe and compare how various factors impact survival rates across different groups. As shown in the graph, users can select patient attributes such as age at diagnosis, disease duration, ACE-III total scores, and gender, along with specific dementia diagnoses. By adjusting these parameters, the graph dynamically illustrates the survival probabilities over time, enabling a detailed comparison between two distinct patient groups.
Patient Group 1
Patient Group 2
Significance of variables
Warning: NAs introduced by coercion
Warning: NAs introduced by coercion
---title: "STAT3926/STAT4026: Statistical Consulting"subtitle: "Survival Analysis"author: - "Prepared by: 510288769" - "Prepared for: David Foxe"title-block-banner: "#d85f33"date: "`r format(Sys.time(), '%d %B, %Y %H:%M')`"format: html: theme: light: [united, style/custom_styles.scss] dark: [darkly, style/custom_styles.scss] embed-resources: true code-fold: true code-tools: true includes: in-header: style/www/header.html unsafe: true smooth-scroll: truetable-of-contents: truenumber-sections: falseserver: shiny---```{r setup, messages = FALSE, echo=FALSE, warning= FALSE}suppressMessages({ suppressWarnings({ library(purrr) library(shiny) library(survminer) library(readxl) library(DT) library(ggplot2) library(tidyverse) library(broom) library(survival) library(dplyr) library(kableExtra) library(scales) library(ggcorrplot) })})```## Executive Summary```{r}data = readxl::read_xlsx("240508_Survival_project_for_maths_students_opened.xlsx")data$`Sex (Male 1, Female 2)`=as.factor(data$`Sex (Male 1, Female 2)`)data[data ==999.0] <-NAdata = data %>%filter(!Diagnosis_number ==2.5)mapping_vector <-c("AD", "lvPPA", "bvFTD", "svPPA", "nfvPPA", "nfvPPA + Parkinson’s plus", "CBS", "PSP", "FTD-MND")names(mapping_vector) <-c("1", "10", "2", "16", "12", "4", "3", "14", "9")data$Diagnosis_name <- mapping_vector[as.character(data$Diagnosis_number)]```## BackgroundPreviously, the client implemented a Kaplan-Meier survival curve to model the survival predictions across different types of dementia.## Client's AimsThe client has two aims for this project. The first is to produce an interactive graph that provides a comprehensive exploration of the time from formal diagnosis to death based on four key covariates: age at diagnosis, sex, disease duration at diagnosis, and overall cognitive performance measured by the ACE-III test. These covariates were selected by the client based on their expertise and the availability of data. The second aim is to explore additional variables, including those from the Cambridge Behavioural Inventory-Revised carer questionnaire, family history of dementia as defined by a Goldman Score, and cognitive abilities. The client suspects these variables may influence survival rates and wants more clarity on the extent of the impact each factor has on survival.## Model To meet the client's objective of understanding how various covariates affect different dementia patient groups, we implemented a Cox Proportional Hazards regression model for each type of dementia, serving as the foundation of the interactive survival graph. This approach was informed by previous client analyses that highlighted statistically significant survival differences across dementia diagnoses. Each model accounted for age at diagnosis, sex, disease duration at diagnosis, and overall cognitive performance as assessed by the ACE-III test. While the Cox regression model does not assume any underlying data distribution, it does require other assumptions to be satisfied (details in the appendix). ```{r}subset_columns <-c("Diagnosis till death", "Age at diagnosis", "Sex (Male 1, Female 2)", "Disease duration at diagnosis", "ACEIII::ACEIIITotal", "Diagnosis_number")# Remove rows with missing values in the specified subset of columnsdata_cleaned <- data %>%filter(rowSums(is.na(select(., all_of(subset_columns)))) ==0)data_cleaned = data_cleaned %>%select(c("Diagnosis till death", "Age at diagnosis", "Sex (Male 1, Female 2)", "Disease duration at diagnosis", "ACEIII::ACEIIITotal", "Diagnosis_number", "Diagnosis_name"))``````{r}model_list =list()test_ph_list =list()covariate_names =c("Age at diagnosis", "Sex", "Disease duration at diagnosis", "ACE-III Total", "GLOBAL")unique_dx = data_cleaned$Diagnosis_name %>%unique()for (dx in unique_dx) { filtered_data = data_cleaned %>%filter(Diagnosis_name == dx) cox_model <-coxph(Surv(`Diagnosis till death`) ~`Age at diagnosis`+`Sex (Male 1, Female 2)`+`Disease duration at diagnosis`+`ACEIII::ACEIIITotal`, data = filtered_data, model =TRUE) zph <-cox.zph(cox_model) tidy_zph <-data.frame(Covariate = covariate_names,Chisq = zph$table[, "chisq"],`Degrees of Freedom`= zph$table[, "df"],`p-value`= zph$table[, "p"],check.names =FALSE ) test_ph_tidy = tidy_zph %>%mutate(Chisq =round(Chisq, 2),`p-value`=round(`p-value`, 3) )row.names(test_ph_tidy) <-NULL model_list[[as.character(dx)]] = cox_model test_ph_list[[as.character(dx)]] = test_ph_tidy}```## Interactive Survival GraphThe interactive survival graph is designed to facilitate the analysis of different patient groups by inputting their specific data. This feature allows users to observe and compare how various factors impact survival rates across different groups. As shown in the graph, users can select patient attributes such as age at diagnosis, disease duration, ACE-III total scores, and gender, along with specific dementia diagnoses. By adjusting these parameters, the graph dynamically illustrates the survival probabilities over time, enabling a detailed comparison between two distinct patient groups. ```{r}#| panel: fillplotOutput('plot')```::: {layout="[[1,1]]"}```{r}#| panel: inputh5("Patient Group 1")numericInput('age_1', 'Age at Diagnosis', 70, min =1, max =100)numericInput('duration_1', 'Disease duration at diagnosis', 3, min =1, max =100)numericInput('ace_total_1', 'ACEIII Total', 60, min =1, max =100)radioButtons("sex_1", "Sex", c("Male", "Female"), "Female", inline=TRUE)radioButtons("diagnosis_1", "Diagnosis", c("AD", "lvPPA", "bvFTD", "svPPA", "nfvPPA", "nfvPPA + Parkinson’s plus", "CBS", "PSP", "FTD-MND"), inline=TRUE)``````{r}#| panel: inputh5("Patient Group 2")numericInput('age_2', 'Age at Diagnosis', 70, min =1, max =100)numericInput('duration_2', 'Disease duration at diagnosis', 3, min =1, max =100)numericInput('ace_total_2', 'ACEIII Total', 60, min =1, max =100)radioButtons("sex_2", "Sex", c("Male", "Female"), inline =TRUE)radioButtons("diagnosis_2", "Diagnosis", c("AD", "lvPPA", "bvFTD", "svPPA", "nfvPPA", "nfvPPA + Parkinson’s plus", "CBS", "PSP", "FTD-MND"), inline=TRUE)```:::```{r}#| context: serverlibrary(survival)library(tidyverse)library(survminer)library(pammtools)data = readxl::read_xlsx("240508_Survival_project_for_maths_students_opened.xlsx")data$`Sex (Male 1, Female 2)`=as.factor(data$`Sex (Male 1, Female 2)`)data[data ==999.0] <-NAdata = data %>%filter(!Diagnosis_number ==2.5)mapping_vector <-c("AD", "lvPPA", "bvFTD", "svPPA", "nfvPPA", "nfvPPA + Parkinson’s plus", "CBS", "PSP", "FTD-MND")names(mapping_vector) <-c("1", "10", "2", "16", "12", "4", "3", "14", "9")data$Diagnosis_name <- mapping_vector[as.character(data$Diagnosis_number)]subset_columns <-c("Diagnosis till death", "Age at diagnosis", "Sex (Male 1, Female 2)", "Disease duration at diagnosis", "ACEIII::ACEIIITotal", "Diagnosis_number")# Remove rows with missing values in the specified subset of columnsdata_cleaned <- data %>%filter(rowSums(is.na(select(., all_of(subset_columns)))) ==0)data_cleaned = data_cleaned %>%select(c("Diagnosis till death", "Age at diagnosis", "Sex (Male 1, Female 2)", "Disease duration at diagnosis", "ACEIII::ACEIIITotal", "Diagnosis_number", "Diagnosis_name"))obs_start =data.frame(time =0, surv =1, upper =1, lower =1)reactiveData1 <-reactive({validate(need(input$age_1 !="", "Please input an age number in patient group 1"),need(input$duration_1 !="", "Please input a disease duration at diagnosis number in patient group 1"),need(input$ace_total_1 !="", "Please input an ACE-III total in patient group 1") ) sex_numeric_1 <-ifelse(input$sex_1 =="Male", 1, 2)data.frame(obs ="Patient Group 1",`Sex (Male 1, Female 2)`=as.integer(sex_numeric_1), `Age at diagnosis`= input$age_1, `Disease duration at diagnosis`= input$duration_1,`ACEIII::ACEIIITotal`= input$ace_total_1,diagnosis = input$diagnosis_1,check.names =FALSE)})reactiveData2 <-reactive({validate(need(input$age_2 !="", "Please input an age number in patient group 2"),need(input$duration_2 !="", "Please input a disease duration at diagnosis number in patient group 2"),need(input$ace_total_2 !="", "Please input an ACE-III total in patient group 2") ) sex_numeric_2 <-ifelse(input$sex_2 =="Male", 1, 2)data.frame(obs ="Patient Group 2",`Sex (Male 1, Female 2)`=as.integer(sex_numeric_2), `Age at diagnosis`= input$age_2, `Disease duration at diagnosis`= input$duration_2,`ACEIII::ACEIIITotal`= input$ace_total_2,diagnosis = input$diagnosis_2,check.names =FALSE)})#Make model for the diagnosis selected rv_model_1 =reactive({ filter_data_1 = data_cleaned %>%filter(Diagnosis_name ==reactiveData1()$diagnosis) cox_model <-coxph(Surv(`Diagnosis till death`) ~`Age at diagnosis`+`Sex (Male 1, Female 2)`+`Disease duration at diagnosis`+`ACEIII::ACEIIITotal`, data = filter_data_1, model =TRUE)}) rv_model_2 =reactive({ filter_data_2 = data_cleaned %>%filter(Diagnosis_name ==reactiveData2()$diagnosis) cox_model <-coxph(Surv(`Diagnosis till death`) ~`Age at diagnosis`+`Sex (Male 1, Female 2)`+`Disease duration at diagnosis`+`ACEIII::ACEIIITotal`, data = filter_data_2, model =TRUE) })#Fit the new data with the model and make a dataframesurv_fit_1 <-reactive({# req(rv_model_1()) fit1 <-survfit(rv_model_1(), newdata =reactiveData1()[2:5]) %>%surv_summary() %>%as.data.frame() fit1 =bind_rows(fit1, obs_start)fit1 = fit1%>%mutate(obs ="Patient Group 1") }) surv_fit_2 <-reactive({# req(rv_model_2()) fit2 <-survfit(rv_model_2(), newdata =reactiveData2()[2:5]) %>%surv_summary() %>%as.data.frame() fit2 =bind_rows(fit2, obs_start) %>%mutate(obs ="Patient Group 2") })new_patients <-reactive({ new_obs <-rbind(surv_fit_1(), surv_fit_2()) # Combine the results})output$dataTable <-renderTable({new_patients()})output$plot <-renderPlot({ggplot(data =new_patients()) +geom_step(aes(x = time, y = surv, color = obs), size =0.8) +geom_stepribbon(aes(x = time, ymin = lower, ymax = upper, fill = obs), alpha =0.2) +theme_bw() +ggtitle("Cox Proportional Hazards Regression on Survival in Dementia") +ylab("Survival Probability") +xlab("Years") +theme(legend.position ="right", legend.direction ="vertical", legend.key.size =unit(0.5, "cm"),legend.text =element_text(size =14), legend.title =element_text(size =0),legend.margin = ggplot2::margin(t =0, unit ="pt"),axis.title =element_text(face ="bold"),plot.title =element_text(face ="bold", size =16, hjust =0.5)) +scale_x_continuous(breaks =seq(min(new_patients()$time), max(new_patients()$time), by =1)) +theme(axis.text =element_text(size =12), # Set x and y tick sizeaxis.title =element_text(size =14)) })```## Significance of variables```{r}suppressMessages({suppressWarnings({library(rms) #Important: Leave library RMS here to not break shiny})})data$`CBI::PercentSleepFCorrected`=as.numeric(data$`CBI::PercentSleepFCorrected`)data$`CBI::PercentBeliefsFCorrected`=as.numeric(data$`CBI::PercentBeliefsFCorrected`)df <- data %>%select(-'PID', -`Diagnosis_name`, -'Diagnosis_number', -`Diagnosis till death`, -`Sex (Male 1, Female 2)`, -`Onset till death`, -`Original_order`, -'SummaryPrimaryDiagnosis_descriptor', -`DAD::ExportTotalIadls`, -`DAD::ExportTotalDad`, -`DAD::ExportTotalBadl`)data$`Sex (Male 1, Female 2)`=as.factor(data$`Sex (Male 1, Female 2)`)#Standarize datadata_standardized <-as.data.frame(lapply(df, scale), check.names =FALSE)data_stand <-cbind(data$Diagnosis_name, data$`Diagnosis till death`, data$`Sex (Male 1, Female 2)`, data_standardized)colnames(data_stand)[colnames(data_stand) =="data$Diagnosis_name"] <-"Diagnosis_name"colnames(data_stand)[colnames(data_stand) =="data$`Sex (Male 1, Female 2)`"] <-"Sex (Male 1, Female 2)"colnames(data_stand)[colnames(data_stand) =="data$`Diagnosis till death`"] <-"Diagnosis till death"data_stand = data_stand %>%drop_na()``````{r}# colnames(data_stand_no_high_vif)data_stand_no_high_vif = data_stand %>%select(-c(`ACEIII::ACEIIITotal`))fit <-survreg(Surv(`Diagnosis till death`) ~ ., data = data_stand_no_high_vif, dist ="lognormal")# fit <- coxph(Surv(`Diagnosis till death`) ~., data= data_stand_no_high_vif)vif <- rms::vif(fit)``````{r, fig.width= 16, fig.height=12}colnames(data_stand_no_high_vif) = gsub("FCorrected", "", colnames(data_stand_no_high_vif))colnames(data_stand_no_high_vif) = gsub("ClinicalAssessment::", "", colnames(data_stand_no_high_vif))corr<-cor(data_stand_no_high_vif[4:21])ggcorrplot(corr, type = "lower", ggtheme = ggplot2::theme_void, lab = TRUE, colors = c("#6D9EC1", "white", "#E46726") ) + scale_y_discrete(position='right')```### Overall significance for all dementia diagnosis```{r}summary_cox_full <-summary(fit)coefs <- summary_cox_full$table %>%as.data.frame()# coefs <- summary_cox_full$coefficients %>% as.data.frame()rownames(coefs)<-sub("`", "", rownames(coefs)) rownames(coefs)<-sub("`", " ", rownames(coefs)) colnames(coefs) <-c("Coefficient", "Std. Error", "z value", "p value")significant_vars <- coefs[coefs[, "p value"] <0.05, ]coefs_sorted <- coefs[order(abs(coefs[,"Coefficient"]), decreasing =TRUE), ]styled_table_for_full_model <- coefs_sorted %>%kable() %>%kable_styling(bootstrap_options =c("striped", "hover", "condensed"))``````{r}rows_to_remove <-c("(Intercept)", "Log(scale)", "Diagnosis_nameFTD-MND")significant_vars <- significant_vars[!rownames(significant_vars) %in% rows_to_remove, ]significant_vars$abso =abs(significant_vars$Coefficient)# Create the barplotggplot(significant_vars, aes(x =rownames(significant_vars), y = abso, fill = Coefficient >0)) +geom_col() +coord_flip() +scale_fill_manual(values =c("#6D9EC1", "#E46726"),labels =c("Negative Impact", "Positive Impact"),name ="Impact Type") +labs(title ="Impact of Significant Coefficients", x ="Factors", y ="Coefficients Impact") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1))```### Significance of variabes by dementia group```{r}unique_dx =c("AD", "bvFTD", "CBS","FTD-MND","lvPPA","nfvPPA") coeffs_list =list()coeffs_significance =list()plot_diag =list()for (name in unique_dx) { data_stand_filtered = data_stand_no_high_vif %>%filter(Diagnosis_name == name) data_stand_filtered = data_stand_filtered %>%select(-Diagnosis_name) fit <-survreg(Surv(`Diagnosis till death`) ~ ., data = data_stand_filtered, dist ="lognormal") diagnosis_fit <-summary(fit) coefs <- diagnosis_fit$table %>%as.data.frame() coefs = coefs %>%drop_na()rownames(coefs)<-sub("`", "", rownames(coefs))rownames(coefs)<-sub("`", " ", rownames(coefs))colnames(coefs) <-c("Coefficient", "Std. Error", "z value", "p value") coefs_sorted <- coefs[order(abs(coefs[,"Coefficient"]), decreasing =TRUE), ] coeffs_list[[name]] = coefs_sorted significant_vars <- coefs_sorted[coefs_sorted[, "p value"] <0.05, ] rows_to_remove <-c("(Intercept)", "Log(scale)") significant_vars <- significant_vars[!rownames(significant_vars) %in% rows_to_remove, ] significant_vars$Variable <-rownames(significant_vars) coeffs_significance[[name]] = significant_vars significant_vars$abso =abs(significant_vars$Coefficient) p =ggplot(significant_vars, aes(x = Variable, y = abso, fill = Coefficient >0)) +geom_col() +coord_flip() +scale_fill_manual(values =c("#6D9EC1", "#E46726"),labels =c("Negative Impact", "Positive Impact"),name ="Impact Type") +labs(title =paste0("Impact of Significant Coefficients in ", name), x ="Factors", y ="Coefficients Impact") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1)) plot_diag[[name]] =list(plot = p)}```::: panel-tabset```{r, results='asis'}#| warning: falseiwalk(plot_diag, ~ { cat('## ', .y, '\n\n') print(.x$plot) cat("\n\n") })```:::```{r}# combined_df = do.call(rbind, coeffs_significance)# ggplot(combined_df, aes(x = Variable, y = Coefficient)) +# geom_col() + # geom_col uses y values directly for bar heights# facet_wrap(~name) + # Faceting by 'diagnosis'# theme_minimal() + # labs(# title = "Coefficient by Variable across Diagnoses",# x = "Variable",# y = "Coefficient"# ) +# theme(axis.text.x = element_text(angle = 45, hjust = 1)) ```## Appendix### Testing Proportional Harzard:::{.panel-tabset}```{r, results='asis'}#| warning: falseheadings <- names(test_ph_list)for (i in seq_along(test_ph_list)) { cat("# ", headings[i], "\n") print(kable(test_ph_list[[i]], format = "markdown") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))) cat("\n\n")}```:::### Results from all dementia diagnosis```{r}styled_table_for_full_model ```### Results from each dementia diagnosis :::{.panel-tabset}```{r, results='asis'}#| warning: falseheadings <- names(coeffs_list)for (i in seq_along(coeffs_list)) { cat("# ", headings[i], "\n") print(kable(coeffs_list[[i]], format = "markdown") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))) cat("\n\n")}```:::